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Abstract 

A fundamental problem in cognitive radio systems is that the cognitive radio is ignorant of the 
primary channel state and, hence, of the amount of actual harm it inflicts on the primary license holder. 
Sensing the primary transmitter does not help in this regard. To tackle this issue, we assume in this paper 
that the cognitive user can eavesdrop on the ACK/NACK Automatic Repeat reQuest (ARQ) fed back 
from the primary receiver to the primary transmitter. Assuming a primary channel state that follows a 
Markov chain, this feedback gives the cognitive radio an indication of the primary link quality. Based 
on the ACK/NACK received, we devise optimal transmission strategies for the cognitive radio so as to 
maximize a weighted sum of primary and secondary throughput. The actual weight used during network 
operation is determined by the degree of protection afforded to the primary link. We begin by formulating 
the problem for a channel with a general number of states. We then study a two-state model where 
we characterize a scheme that spans the boundary of the primary-secondary rate region. Moreover, we 
study a three-state model where we derive the optimal strategy using dynamic programming. We also 
extend our two-state model to a two-channel case, where the secondary user can decide to transmit on a 
particular channel or not to transmit at all. We provide numerical results for our optimal strategies and 
compare them with simple greedy algorithms for a range of primary channel parameters. Finally, we 
investigate the case where some of the parameters are unknown and are learned using hidden Markov 
models (HMM). 



2 

I. Introduction 

Cognitive radio technology is a solution to the problem of spectrum under-utilization caused 
mainly by static spectrum allocation. In cognitive radio networks, the licensed users coexist with 
cognitive users, also known as the secondary or unlicensed users. Primary users are licensed 
users who are assigned to certain channels, and they have the right to transmit over the band 
of spectrum assigned by regulatory bodies in their respective countries. If other unlicensed, or 
secondary, users want to share the spectrum, then they must use the spectrum when it is unused 
by the primary users and/or when they can limit the interference they cause on the primary 
receivers below a certain specified level. The secondary users attempt to utilize the resources 
unused by the primary users adopting procedures that aim at protecting the primary network 
from service interruption and interference. 

There has been interest in schemes that make use of the feedback of the primary link to predict 
the behavior of the primary user in the future and, in the case of primary channel temporal 
correlation, to gain knowledge about the channel between primary transmitter and receiver 
(e.g., 01, GO and 0). In flU, the secondary user observes the automatic repeat request (ARQ) 
feedback from the primary receiver. The primary user achieved packet rate can be calculated by 
counting the ACK feedback messages. The cognitive radio's objective is to maximize secondary 
throughput under the constraint of guaranteeing a certain packet rate for primary user. The 
main difference between our work and [1] is that in [1] there is no use of the possible channel 
correlation across time, whereas we assume that the primary channel state follows a Markov 
chain. The cognitive transmitter can hence exploit the ACK7NACK feedback messages to predict 
the primary channel state during the next transmission phase. In [2], assuming a temporally 
correlated channel between the primary transmitter and receiver, the cognitive transmit power is 
adjusted based on primary channel state information (CSI) feedback. A real-time fading channel 
model is assumed rather than a channel with a finite number of states as we consider and discuss 
below. However, the computation of the optimal procedure in is computationally prohibitive. 
In 0, the cognitive radio adopts an active sensing protocol where, prior to the sensing, the 
cognitive radio generates a temporary jamming signal to deliberately interfere with the primary 
user message. If the primary channel is indeed active, the primary network will increase its 
transmit power upon receiving the jamming signal. The cognitive radio will receive the power- 
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boosted primary user signal that is more easily detectable. Although this hidden power-feedback 
loop should increase the reliability to detect the presence of the primary user, in our proposed 
scheme we assume a passive cognitive radio and the primary link is always active where the 
cognitive radio can also transmit during the erasure states as we will show below. 

There has been a series of recent work on cognitive MAC for opportunistic spectrum, e.g., flU, 
0, 0, flU, iflOl . and [[TTT] . In Hi, the primary user activity remains fixed over the duration of a 
slot and switches between idle and active states according to a two-state Markovian process. The 
channel between the primary transmitter and receiver is not considered, and the feedback used to 
predict the channel availability is provided by the secondary receiver. However, in our work, the 
primary channel is considered and its state changes between different time slots and the feedback 
is provided by the primary receiver not the secondary one. In Q, the work in 01 is extended to 
account for energy consumption and spectrum sensing duration optimization. In HQ, the authors 
focus on the ARQ messages used in primary user data-link-control which are overheard by the 
secondary transmitter, which can optimize its access policy by assessing primary user reception 
quality. The primary channel is assumed to be of fixed quality resulting in two fixed and known 
packet error rates corresponding to the presence and absence of secondary user transmissions. The 
difference between our work and that in [8|| is that in our work, the primary channel has different 
channel qualities depending on the primary channel state. The more the channel states the primary 
channel have, the more the model will be more realistic and the more efficient utilization of 
the channel recourses. In 0, the coexistence of two unlicensed links is considered, where one 
link interferes with the transmission of the other with quasi-static fading and in the absence 
of channel state information at the nodes. The problem is formulated as a Partially Observable 
Markov Decision Process (POMDP) and a greedy solution is proposed. In ifTOll . the secondary 
source is allowed to superpose its transmissions over those of the primary source. The secondary 
source aims to maximize its own throughput, while guaranteeing a bounded performance loss for 
the primary source. [11] goes along the same line, but considers secondary users cooperation. 
In this type of problems the framework of Partially Observable Markov Decision Processes 
(POMDP) is usually needed given the uncertainties of the quality of the primary link, and of 
the primary user activity as a result of sensing errors lfT31l . |[T6ll . ifTTl . 

If we assume that the transition probabilities of the underlying Markov chain of the primary 
channel is unknown, we use a Hidden Markov Model framework to learn these transition 
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probabilities. The problem of HMM arises when the actual system state is unknown. HMM is 
widely used in many applications. The detailed procedure of estimating transition probabilities 
is provided in fl2T|. In ||23l and l|25ll , HMM has been used for spectrum sensing. The sub-band 
occupancy at any time instant can be considered as a state, which can be either free (unoccupied 
by a primary user) or busy (occupied by a primary user). The states of a sub-band are monitored 
over L consecutive time periods. There are two possible cases discussed in ll23l . The first option 
is to deal with the situation when the parameters of the HMM and error probabilities are known. 
Viterbi algorithm is used to overcome the computational complexity in the likelihood solution. 
The second alternative is to deal with the situation when the parameters of Markov chain and the 
error probabilities are unknown. Expectation-Maximization (EM) algorithm is used to estimate 
these parameters. 

Note that HMM has been used in the context of cognitive radios like in [J22J and [|24l- In [|22|. 
HMM is used to model and predict the spectrum occupancy of licensed radio bands. In [|24l|. 
HMM is used for quickest spectrum detection for cognitive radio where a frequency sweeping 
device sweeps the wide-band spectrum and the samples of the wide-band power spectrum density 
(PSD) are fed into different HMMs sequentially. In our work, the ARQ is considered as the 
secondary user observation vector, which can be considered as an input to HMM to estimate 
the transition probabilities. The quality of estimation depends on the length of the observation 
vector. 

In this paper, we assume that the primary user is working in a saturation regime where it 
always has data to transmit over the primary link. It sends one packet in each time slot, and 
receives an ACK or NACK feedback from its receiver at the end of the time slot. The feedback 
is received correctly by both the primary and secondary transmitters. The channel between the 
primary transmitter and receiver is modeled as a Markov process with a finite number of states 
where the channel quality and hence the probability of correct reception is determined by the 
state. The state of the channel does not change over a slot. The channel may switch states at 
the beginning of each slot according to the transition probabilities of the Markov process. We 
laos study the special cases when the primary link follows a two state and a three state Markov 
process. The cognitive user exploits the ACK7NACK feedback from the primary receiver to 
predict the quality of the primary link. Note that, similar to other papers, this is also a POMDP 
problem and we can use techniques similar to other approaches. We assume that one secondary 



5 



user operates on one primary channel. This is a widely used assumption in the literature. The 
system may have a number of orthogonal primary channels where each can be potentially used by 
one secondary terminal. At the beginning of each time slot, the secondary user decides whether 
to remain silent and listen to primary user feedback, or to carry out transmission. The objective 
is to maximize the weighted sum throughput of both the primary and secondary links. 

This optimization problem has an exploration-exploitation tradeoff. The tradeoff is between 
the choice of the cognitive user activity which maximizes the secondary user throughput, and that 
which gives the secondary user knowledge about the channel state information of the primary link 
through the ARQ feedback. The correlation between the channel states, via the Markov process, 
is what enables us to have this trade-off. A similar trade-off can be achieved if there is correlation 
between primary user activity in consecutive time slots iTTOll . A more general model would be to 
assume both correlations. Note that the maximization of the weighted sum throughput can be a 
proxy for the optimization problem of maximizing the secondary throughput under a constraint 
on the primary throughput 10. Note however that solving the constrained optimization problem 
is more general. 

Our contributions in this paper are as follows. We solve the weighted sum throughput max- 
imization problem via modeling it as a dynamic programming problem, and employ Bellman's 
equation ifTTl to arrive at the optimal strategy. For the two state case, we prove that the optimal 
policy is a threshold based policy in the belief of the state of the channel. This threshold can 
be obtained numerically. Moreover, when the discounting factor of the dynamic programming 
problem approaches unity we obtain a closed form solution to the weighted sum throughput op- 
timization problem, and can find exact description of the strategy that maximizes this throughput 
for any weight. Note that changing the weight spans the boundary of the primary-secondary rate 
region. We then extend the two state single channel case to the case of a channel with three states 
and to the multiple channel case. We obtain numerical solutions to these problems. We then, and 
rather than assuming knowledge of the transition probabilities of the above model, investigate 
the case where some of the primary network state transition probabilities are unknown, and we 
show how to estimate these parameters using Hidden Markov Model (HMM) formulation. 

One of the advantages of our scheme is that the ARQ feedback can capture the temporal 
correlation in the channel. The cognitive user can access the primary channel in two cases, 
when the primary channel quality is relatively high (primary user can transmit successfully 
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regardless of cognitive user activity) and when its quality is very low (primary user transmission 
fails whether secondary user is active or not). This advantage cannot be captured in schemes 
employing spectrum sensing only. 

The paper is organized as follows. The general multi-state system model and assumptions 
are described in Section HB We also introduce in this section the two-state, Gilbert-Elliot, three- 
state system models as special cases and the secondary user optimal strategy is obtained using 
dynamic programming. In section [Till we extend our formulation to the multi-channel case and 
specifically address the two channel case. When the secondary user have the chance to choose 
a channel from multiple primary channels to transmit on, the secondary user should choose the 
channel that maximizes the reward function as we illustrate in Section [nTJ The framework of 
Hidden Markov Model (HMM) is used in Section [IV] to estimate the primary network transition 
probabilities if it is unknown given the feedback observations in the form of ARQ feedback. 
Numerical results are presented in Section [V] Our work is concluded in Section [Vl] 

II. System Model and Proposed Approach 

Our proposed model assumes that we have one primary link and one secondary link. An 
illustration of the setup is provided in Figure \T\ We are concerned with the Z-interference 
channel model lfT8l where the interference from the primary transmitter on the secondary receiver 
is ignored. The Z-interference channel models important applications such as the interference 
caused by macro-cell users on femto-cell receivers, which is known in the literature as the "loud 
neighbor" problem. In our context, the primary terminals may be close to one another and use 
small transmission power, whereas the cognitive terminals may be far from one another and use 
high power for communication causing considerable interference on the primary link. 

We assume that the primary link is always active, i.e., primary transmitter is working in 
saturation regime. The primary link follows a Markov model with S states; {1, 2, S}. Each 
state of the S states describes a certain quality for the primary channel. The secondary user 
is aware of the state transition matrix Y[ of this Markov chain in addition to the probabilities 
of success and failure for primary transmission in each of these states. Table II shows these 
probabilities for state i, i = {1, 2, S} 

Our objective is to choose the transmission strategy that maximizes the expected value of the 
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average weighted sum throughput R given by 

1 K 

R = lim — V (wR p (n) + (1 - w) R s (n)) (1) 

n=l 

where R p (n) and R s (ri) are the primary and secondary throughput in time slot n respectively, 
and < w < 1 is a weighting factor that determines the relative importance of the two rates. In 
order to protect the primary user from interference and service interruption, parameter w can be 
chosen close to one. Note that the maximization of the weighted sum throughput can be a proxy 
for the optimization problem of maximizing the secondary throughput under a constraint on 
the primary throughput [6j. Note however that solving the constrained optimization problem is 
more general than solving (1) 0. Consider for example a convex primary- secondary throughput 
region, and consider the primary throughput to be constrained to a certain point on the boundary 
of this region. The maximum secondary throughput that can be obtained is the one that solves 
the optimization of the weighted sum throughput where the "weight" produces a line tangent to 
the boundary of the region at this certain point. The weighted sum formulation hence resembles 
imposing a constraint on the primary throughput. Now this is tougher and more meaningful 
that the typical interference constraint. An interference constraint is concerned with the total 
interference power at the primary receiver regardless of the quality of the primary channel. On 
the other hand, a throughput constraint is related to both the interference imposed on the primary 
receiver and the quality of the primary transmitter-receiver link. If the primary link is in a good 
state, the primary receiver may tolerate more interference from the secondary user without an 
appreciable degradation in performance. 

Let r p be the primary user reward, if the primary user succeeds to transmit one packet through 
the binary erasure channel. Let r s be the secondary user reward, if the secondary user decides 
to transmit a packet. Note that we can take account of any possible packet loss in the secondary 
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channel when we calculate the value of r s . 

The belief is the secondary user evaluation of the probability of the actual primary link state 
in the next time slot given the history of actions and observations. The secondary belief is given 
by the vector ~$ = {p\P2---Ps) with J2t=iP k = where pk is the probability that the primary 
channel is in state k at the beginning of a time slot given all the previous secondary user's actions 
and observations. The belief is updated by the secondary user in each time slot on the basis 
of its chosen action during the time slot, the primary ARQ feedback that it observes, and the 
primary channel state transition probabilities. Specifically, the secondary user updates this belief 
of the primary state according to Markovian update equation ~$ n = ~p* n -i(u P dated) EL where ~p* n 
is the belief vector at time slot n, and p n -i(updated) is calculated using the ACK or NACK 
overheard from the primary transmitter according to one of the following four cases depending 
on the action taken by the secondary user and the corresponding outcome. 

Case 1 : Secondary user transmits and a NACK is received from the primary network. There- 
fore, the secondary belief of the primary channel state during the next time slot is 

P(NACK\i) Pi (l-a t ) P i 

P l{ updated){i) ~ ^ p {NACmpt _ ^ ^ 

where Pi( up dated)(l) is the i th element of ~f n -i{u P dated) for the l th case and P(NACK\i) is the 
probability of receiving a NACK given the primary channel state is i. 

Case 2: Secondary user transmits and an ACK is received from the primary network. Therefore, 
the secondary belief of the primary channel state during the next time slot is 

_ P(ACK\i)pi _ diPi 
Pl{updated) { ) ~ P(ACK\i) Pi ~ £\ a iPi { ) 

Case 3: Secondary user is idle and a NACK is received from the primary network. Therefore, 
the secondary belief of the primary channel state during the next time slot is 

Pi{updated)\0) ~ ^2-(l — b-)p- 

Case 4: Secondary user is idle and an ACK is received from the primary network. Therefore, 
the secondary belief of the primary channel state during the next time slot is 

Pit updated) (4) = F^TT^ (5) 
E l biPi 

Then, depending on the case, we have ~f n (l) = ~f n -i(u P dated)(l) U 



R(k, p, a 



If the secondary user decided to remain silent, the expected immediate gain in the next time 
slot can be calculated as: 

Gi(~f) = wr p P(ACK\idle) = wr p (J2 a iPi ) (6) 

i 

where P(ACK\idle) is the probability of receiving an ACK given the secondary user is idle. 
If the secondary user is transmitting, the expected immediate gain in the next time slot can be 
calculated as: 

G2(p ¥ ) = (1 — w)r s + wr P P (AC K [transmit) = (1 — w)r s + wr p (^ biPi) (7) 

i 

Thus, the expected immediate reward is: 

Gi(p) if the secondary user is silent in time slot k 
(j2( P ) if the secondary user is active in time slot k 
The optimal strategy is the strategy that maximizes the following discounted reward function 

{K+k-l ~\ 
a n - k *R(n,f n ,a n ) \f k =p\ (8) 
n=k J 

V K (j?) satisfies the following Bellman equation IfTTl : 

V K (f) = maxLr p P(ACK\idle) + aP(ACK\idle)V K -\f(4)) 

+ a(l - P (AC K\idle))V K ~ 1 (~f {?>)), 

(1 - w)r s + wr p P(AC K\transmit) + aP(ACK\transmit)V K ~ 1 ('p^(2)) 
+ a(l - P(ACK\transrmt))V K -\f(l))^ 

(9) 

where 

V l (~$) = max{wr p P(ACK\idle), (1 - w)r s + wr p P(AC K\transmit)} (10) 

When K = oo, V(p) denote the maximum achievable discounted reward function. V(p) satisfies 
the following Bellman equation IfTTl : 

V(p) = m&x\wr p P{ACK\idle) + aP(ACK\idle)V(f(A)) + a(l - P(ACK\idle))V(f(3)), 



(1 — w)r s + wr p P(ACK\transmit) + aP(ACK\transmit)V(~f (2)) 
+ a{l - P(ACK\transmit))V(~f(l)) 



(ID 
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The solution can be obtained via Dynamic Programming. We defer the explanation of our 
algorithm to the following subsections where we introduce some special cases to better explain 
our approach. 

A. Erasure/Non-Erasure Channel Model 

The primary link follows a two-state Markov chain. This is a special case of the model 
described above, where S-2, and the a/s and V s are all '1' or '0'. In particular, we assume 
that during each time slot, the primary link is either in an erasure state E where primary user 
transmission always fails or a non-erasure state N where primary user transmission is successful 
when there is no interference. It switches states from one time slot to the next according to a 
Markovian process as shown in Figure [2l The process is specified by two parameters Pee and 
Pne* where Pee is the probability that the primary network is in erasure in the next time slot 
given that it is in erasure state in the current slot, and Pne is the probability that the primary 
network is in erasure in the next time slot given that it is in non-erasure state in the current 
slot. We assume that the transition probabilities of the Markov chain are known a priori but we 
introduce a technique for estimating them in Section [IV] The transition matrix P which includes 
the transition probabilities is given by 

Pee Pen 
Pne Pnn 

Hence, the stationary probabilities of being in erasure and non-erasure for primary network are 
P(E) = f ^^p EN and P(N) = p N ^+p EN » respectively. Note that, the erasure state causes the 
primary transmission to fail, while the non-erasure state results in successful packet delivery 
to the primary receiver only when there is no interference from the secondary transmitter. That 
is, if the cognitive user decides to transmit in the non-erasure state, its transmission causes the 
erasure of the primary user packet. 

As long as the cognitive radio is idle, it can eavesdrop on the primary user ARQ through 
which the secondary transmitter can detect the current state of the primary link and, consequently, 
decide the erasure probability of the next state whether Pee or Pne based on the ARQ feedback. 
However, if the cognitive radio decides to transmit, it causes the primary user packet to be erased. 
The secondary user then overhears a negative acknowledgment (NACK) from the primary receiver 
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no matter what the state of the primary channel is. Thus when the cognitive user transmits, it 
becomes unaware of the primary link state because of the collision it causes with the primary 
transmitted data. The primary receiver will hence not be able to decode the data, and will send 
a negative acknowledgment to its transmitter irrespective of its actual channel state. Hence, the 
ACK/NACK feedback provides no information because the secondary transmitter is unable to 
decide whether the negative ARQ feedback is actually due to the collision or to the fact that the 
primary state was in erasure. 

The belief here is defined as the probability that primary link is in erasure state at the very 
beginning of the time slot from the secondary terminals's perspective given the history of its 
actions and observations. Hence, the expected primary throughput in time slot k as estimated 
by the secondary transmitter can be given by r p (1 — Pk)- The belief is updated from one time 
slot to another using the Markov property according to the following. 

Pee if current state is erasure 



Pk 



P NE if current state is non-erasure 



Pic-iPee + (1 — P/c-i)Pne if current state is unknown 
Note that the third possibility occurs when the secondary user transmits. The expected secondary 
throughput in time slot k is as before given by r s 

The optimal strategy and maximum throughput can be derived using one of the following 
approaches. 

1 ) Dynamic Programming: We propose to use dynamic programming techniques to arrive at 
the optimal strategy. The secondary user opportunistically accesses the channel by first synchro- 
nizing with the slot structure of primary network. The goal of the secondary user is to transmit 
during the erasure states, and allow the primary user to transmit during the non-erasure states 
in order to maximize the sum throughput. The main challenge is that the secondary user cannot 
know the next state exactly. It has to operate on the basis of its belief p k . At the beginning of 
each time slot, and based on previous actions and observations, the secondary user can calculate 
the probability pk- Dynamic programming can then be used to find the optimal strategy. The 
decision at each value of p k maximizes the expected reward function and hence the optimal 
strategy at time slot k is the strategy that maximizes the following discounted reward function 
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( K+k-l 



V K (p) 



max 

a ki a k + l>--- a K+k + l 




a n k R(n,p n ,a n ) | p k = p 



(12) 



where < a < 1 is a discounting factor, K G {1,2, ..} is the control horizon, and a n is the 
action taken at time n. As a decreases, the secondary user puts more emphasis on its short-term 
future gains. The term R (n,p n , a n ) is the reward accrued at instant n when the belief is p n and 
the action taken is a n . In this paper, the reward is given by 



where R p and R s axe the primary and secondary throughputs, respectively. 

Note that at a certain time instant and a certain p k value, the path through the "decision tree" 
depends on the decision taken-on whether to transmit or not. For that particular p k value at any 
other time instant, the optimal decision would not change. Hence we eliminate the subscript k 
because the expected future reward depends only on the value of p regardless of the index of 
the time slot. 

If we assume an infinite horizon optimization, and through a dynamic programming argument, 
the state of the system can be fully parameterized by the belief that the channel is in erasure 
the next time slot, p, where we dropped the time dependence. Hence the action taken by the 
cognitive user depends only on p. The belief state p can be updated according to one of the 
following three cases depending on the action taken by the secondary user and the corresponding 
outcome. We follow here the notation presented in [|5j. 

Case 1 : Secondary user is silent and a positive acknowledgment (ACK) is received from the 
primary network. The ACK implies that primary network has been in the non-erasure state, and 
primary receiver has succeeded in decoding the packet. Therefore, the secondary belief that the 
channel would be in erasure during the next time slot is 



where p(l) is the update expression for p for the /th case. Each case is a certain combination 
of secondary user decision and observation. 

Case 2: Secondary user is silent and a negative acknowledgment (NACK) is received from 
the primary network. This implies that primary network has been in erasure state and the sent 



R(n,p n ,a n ) = wR p (n,p m a n ) + (1 - w) R s (n,p n ,a n ) 



(13) 



piX) = Pne 



(14) 
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packet has not been delivered successfully. Thus, 

p(2) = Pee (15) 

Case 3: Secondary user transmits. The probability of erasure is updated by the Markovian 
property as follows 

p(3)=pP EE + (l-p)P NE (16) 

We denote the weighted expected instantaneous throughput when the secondary user listens by 
Gi (p) which is given by 

G 1 (p) = wr p (l-p) (17) 

For the case of the secondary user is transmitting, we denote the weighted expected throughput 
by 

G 2 (p) = (l-w)r s (18) 

A greedy scheme would just compare Gi(p) with G 2 (p) and if G 2 (p) > G 1 (p), the secondary 
user decides to transmit, otherwise it remains silent. The expected instantaneous reward is: 

Gi(p) if the secondary user is silent at time slot k 
G 2 (p) if the secondary user is active at time slot k 

Following the definitions in Q, let V K (p) denote the maximum achievable discounted reward 
function obtained by maximizing Equation (fT2)) . When K < oo, V K (j>) satisfies the following 
Bellman equation ifTTl : 

V K (p) = max ju7r p (l - p) + a(l - p)V K ~ 1 (p(l)) + apV K ~ l (p(2)) , 

(19) 

(1 - w)r s + aV K " 1 {p{3)) ' 



R(k,p, a) 



where wr p (l — p) and a(l — p)V K ~ l (p(l)) + apV K ~ 1 (p(2)) correspond to the expected im- 
mediate primary user reward and total discounted future reward respectively if the secondary 
user does not transmit, and (1 — w)r s and aV K ~ 1 (p(3)) correspond to the expected immediate 
secondary reward and total discounted future reward respectively if the secondary user transmits. 

V 1 ^) = max {wr p (l — p), (1 — w)r s } (20) 
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When K = oo, V K (p) = V K l (p) = V(p) which satisfies the following Bellman equation: 

V{p) = max jwr p (l - p) + a(l - p)V (p(l)) + apV (p(2)) , (1 - w)r s + aV(p(3)) J (21) 

V(p) can be obtained iteratively. The optimal strategy is obviously a function of p. We show 
in Appendix IVII-AI that the optimal strategy is a threshold based policy, i.e. the secondary user 
will transmit if p > p th . In the next subsection, we present a closed form expression for the 
maximum throughput and optimal strategy when the discounting factor a tends to unity. 

2) Closed Form Solution when a — > 1: We assume that P E e > -Pne making the belief pk a 
monotonic function with time as long as the secondary user is transmitting lfT9ll . This can be 
readily seen by solving the first order difference equation governing the evolution of p k to obtain 

Pk = (Pee - Pne) K p k -K + [l - (Pee - Pne)*] P (E) (22) 

where Pee and Pne are Markov chain transition probabilities, P (E) is the steady state probability 
of being in an erasure state E where P (E) = > and p k is the probability of being in 

erasure in time slot k. 

Using this equation, we can find the probability of erasure in time slot k, p k as a function of 
probability of erasure in time slot (k — K), Pk-K- 

For example, if we consider a single time slot shift (K = 1), we will get the basic Markovian 
property. 

Pk = Pk-iPEE + [1 - Pk-i] Pne 

It is clear that if Pee — Pne > 0, the belief p k is a monotonic function with time, otherwise the 
term (P EE — Pne)^ oscillates between positive and negative values. 

If the inequality wr p (1 — Pee) > (1 ~ w)r s is satisfied, this implies that the inequality 
wr p (1 — Pne) > (1 — w)r s is also satisfied, since Pee is greater than Pne- Thus, from the two 
previous inequalities, we can deduce that the optimal secondary user strategy is to listen always 
because the expected primary throughput is greater than the expected secondary throughput re- 
gardless of the actual system state, whether it is E or N. Similarly, if wr p (1 — Pne) < {l—w)r a , 
the optimal secondary strategy is to transmit always. For any other condition, the optimal 
secondary strategy is as follows. The secondary transmitter listens as long as an ACK is received 
because wr p (1 — Pne) > (1 — w)r s and in that case maximizing the throughput in the next time 
slot is optimal since we do not affect future decisions as the secondary user will make use of 
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the knowledge of the next state while it is silent. Once a NACK is received, the secondary user 
transmits M consecutive packets. Thus, the maximization problem in (fT2)) can be reduced to an 
optimization problem over the number of secondary user consecutive transmitted packets M. 

It is hence possible to model the system as seen by the secondary user as a 3-state Markovian 
chain, where the secondary user either knows that the primary link is in erasure or non-erasure 
(while silent) or sends M -packets without actually knowing the exact state of the primary link. 
Figure [3] shows such a representation of our algorithm. The first state N represents the state 
that the secondary user is silent and that the primary channel state is in the non-erasure state. 
The secondary user overhears an ACK at the end of the primary transmission, and hence the 
probability to remain in the same state is Pnn and the probability to move to state E is Pne- 
The state E refers to the state where the primary link in the erasure state. The secondary user 
receives a NACK at the end of the primary user transmission, and hence moves to the "Send 
M packets" state. The secondary user transmits M consecutive packets, then the secondary user 
returns back to idle state and starts to listen to the feedback again and returns to state E or N 
based on this feedback. 

When the Markov chain is in state N, the primary network achieves a throughput of r p . When 
it is in state S, the secondary user achieves a throughput of Mr s as the system remains in this 
state for M time slots. 

In order to find an expression of the throughput as a function of M, we find the stationary 
distribution of each state of the Markov chain. Let the steady state probability of N, E and S 
states be P^ s , P| s and P| s respectively, then 



pss _ 1 - T M (P EE ) 

^ N 1 + 2P NE -T^(P EE ) {Zi) 
p 

pss _ pss _ NE (r> 

S "1 + 2P NE -T^(P EE ) (24) 
where T M (P EE ) is the probability of erasure in time slot k given that the state of Markov chain 
in time slot k — M was erasure. We can find T A/ (P EE ) from the two-state Markov chain: 

rpM ( td \ -Pne + (-Pee - Pne) (m+1) (1 - Pee) 
1 I^eeJ 7—p p 

1 + -fNE — -'EE 

Recall that we assume a positively correlated channel with P EE > Pn E . 
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The average primary throughput R p and the average secondary throughput R s can be written 

as: 

r pss 

R = r JlfJi (26) 

p P° s + P| s + MP° S 

A = ^ (27) 

P^ s + P™ + MP£ S 
R(M) = wR p + (1 - w)R s (28) 

The problem now becomes to find M that maximize the average weighted sum throughput R(M). 
We notice that the optimal value of M depends on the weight w. The throuhgput obtained using 
this scheme spans a number of points on the outer bound of the rate region with the optimal 
values of M (M is integer) that maximize the weighted sum throughput. The outer bound of the 
rate region here is piecewise linear, and can be achieved by time division multiplexing between 
the different integer values of M. 
Two remarks are in order here: 

1. We show in Appendix IVII-AI that the optimal strategy is a threshold-based policy on the belief 
Pk, and since the belief pk is monotonic with M, finding the threshold amounts to finding the 
value of M that maximizes the throughput expression. 

2. It can be shown that the throughput is a quasi-concave function of M, and through some 
algebraic manipulations, one can arrive at the value of M that maximizes the throughput. This 
can be shown by subtracting R(M) from R(M + 1). Treating M as a continuous variable, we 
can show that this difference has only one positive finite root that is greater than or equal to 
unity. By finding this root, we find the value of the unique optimal M . This proof is shown in 
the Appendix IVII-Bl 



B. Gilbert-Elliot Model 

In the previous model, if the primary channel is in an erasure state, it will fail to deliver any 
packet with probability one, while in a non-erasure state it will succeed to deliver any packet 
with probability one. Here we use the more general model of a good-bad Gilbert-Elliot model (B 
and G states), where the probability of successful delivery of a packet depends on the state of the 
channel but is not exactly 1 or 0. The probabilities also depend on whether the secondary user 
is transmitting or not, as discussed before. For example, when the primary channel is in the bad 
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state B and the secondary user is silent, there is a probability 7x of receiving an ACK (signifying 
correct reception of a packet) from the primary network. Table I gives the probabilities of such 
a channel model. This is the same as the general case when S=2, and here we set 71 = a\, 

72 = h, 73 = «2 and 74 = b 2 . 





Sec. silent 


Sec. transmit 


Bad 


7i 


72 


Good 


73 


74 



TABLE II 

Probability of successful delivery of primary user packets 



Typically, the probabilities 71, 72 and 74 should be close to zero and the probability 73 should 
be close to one. The Erasure/Non-Erasure channel model corresponds to the case 71 = 72 = 
74 = and 73 = 1. 

The state of the system can be fully parameterized by the belief that the channel is in B state 
the next time slot, p. The action taken by the cognitive user depends only on p. The belief p 
can be updated according to one of the following four cases depending on the action taken by 
the secondary user and the corresponding outcome. 

Case 1: Secondary user transmits and a negative acknowledgment (NACK) is received from 
the primary network. Therefore, the secondary belief that the channel would be in B state during 
the next time slot is 

p(l) = P(B\NACK)P BB + (1 - P(B\NACK))P GB (29) 

where P(B\N ACK) is the probability of B state in the current time slot given that the secondary 
user receives a NACK in the same time slot and p(k) is the update expression for p for the A;th 
case. P(B\NACK) can be obtained from Bayes' rule as follows. 

P(NACK\B)P(B) 



P(NACK\B)P(B) + P{NACK\G)P{G) 

(1-7 2 )P (30) 



(l-7 2 >+(l- 74 )(l-p) 
Case 2: Secondary user is transmitting and a positive acknowledgment (ACK) is received from 
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the primary network. Thus, 

p(2) = P{B\ACK)P BB + (1 - P{B\ACK))P GB (31) 

where 



P ( B\ACK) . P{AC K\B)P(B) 



P(ACK\B)P(B) + P(ACK\G)P(G) 

12P 



12P + 74(1 - P) 

Case 3: Secondary user is silent and a NACK is received from the primary network. 



P(NACK\B)P(B) + P(NACK\G)P(G) 
(1 -7i)p 



(l-7i)P+(l-73)(l-p) 
Case 4: Secondary user is idle and an ACK is received from the primary network. 



P(B\ACK) 



P(ACK\B)P(B) + P(ACK\G)P(G) 
HP 



(32) 



p(3) = P(B\NACK)P BB + (1 - P(B\NACK))P GB (33) 

where 

P(B\NACK) . P^OT^Z?) 



(34) 



p(4) = P( j B|ACK)P B b + (1 - P(5|AC7i))P G B (35) 

where 

P(ACA'|P)P(P) 



(36) 



7iP + 7s(l -P) 

The parameter p characterizing the belief state is updated by one of the previous four condi- 
tions. If secondary user is listening, the expected current gain can be calculated as: 

Gx(p) = wr p P(ACK\idle) (37) 

where P{ACK\idle) is the probability of receiving an ACK given the secondary user is idle 

P{ACK\idle) = 7iP + 73(l-p) (38) 

But if the secondary user is transmitting, the expected current gain is: 

G 2 (p) = (1 - w)r s + wr p P (ACK {transmit) (39) 
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where P (AC K\transmit) is the probability of receiving an ACK given the secondary user is 
transmitting 

P (ACK {transmit) = >y 2 p + 74(1 - p) (40) 
The expected current reward is: 

G\(p) if the secondary user is silent in time slot k 
G 2 (p) if the secondary user is active in time slot k 
The optimal strategy is the strategy that maximizes the following discounted reward function 

(K+k-l ^ 

El a n ~ k * R(n,p n ,a n ) \p k =P> (41) 



R(k,p, a) 



n=k 



V K (p) satisfies the following Bellman equation IfTTl : 



V K (p) = max\ wr p P(ACK\idle) + aP(ACK\idle)V K -\p(A)) 
+ a(l - P(ACK\idle))V K - 1 (p(3)), 

(42) 

^1 — w)r s + wr p P(ACK\transmit) + aP(ACK\transmit)V K - 1 (p(2)) 



+ a(l - P(ACK\transmit))V K - 1 (p(l)) 
where 

V 1 (p) = m&x{wr p P(ACK\idle), (1 - w)r s + wr p P(ACK\transmit)} (43) 

When K = 00, V(p) denote the maximum achievable discounted reward function. V(p) satisfies 
the following Bellman equation IfTTl : 

V(p) = m&x^wr p P(ACK\idle) + aP(ACK\idle)V(p(4)) + a(l - P(ACK\idle))V(p(3)), 
(1 — w)r s + wr p P(AC K\transmit) + aP (AC K\transmit)V (p(2)) 
+ a(l - P(ACK\transmit))V(p(l)) 



(44) 

We show in Appendix IVII-CI that the optimum policy is a threshold based policy in the value 
of p. We solve Equation (|42|) numerically. Specifically, we solve it iteratively via approximating 
the value function at a finite number of belief values on a grid. The value function is initialized 
and then (|42|) is used to update it. For p values not belonging to the grid, interpolation or 
extrapolation is used. After convergence, the secondary terminal decides whether to transmit or 
listen based on the term that maximizes V(p) at each value of p. 
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C. Three-state System Model 

In this subsection, we address the three-state model where the primary channel now follows a 
three-state Markov chain whose states are named Bad (B), Good (G) and Very good (Vg) with 
transition matrix P where 





Pbb 


Pbg 




p = 


Pgb 


Pgg 


P GVg 




PygB 


PygG 


-PygVg 



If the secondary user is listening, the primary user can deliver its packet if the channel state is 
G or Vg. But if the secondary user is transmitting, the primary user transmission success is only 
in state Vg. This means that the primary and secondary users can both simultaneously transmit 
successfully in state Vg. Table III shows the primary packet success or failure during different 
channel states and secondary user activities. 





Sec. silent 


Sec. transmit 


B 


fail 


fail 


G 


success 


fail 


Vg 


success 


success 



TABLE III 

Primary packet delivery status during different secondary user activity 

We can also apply dynamic programming on that system with three channel states to arrive at 
the optimal decisions for the secondary user, whether to transmit or to listen, in any situation to 
maximize the weighted sum of the primary and secondary throughput. 

Here we parameterize the belief state by two parameters p and q, where p is the probability 
that the primary network is in state G in the next time slot and q is the probability that the 
primary network is in state Vg in the next time slot. This implies that the probability that the 
primary network is in state B is (1—p — q). After each time slot, depending on the action taken 
by secondary user and the corresponding feedback, p and q can be updated according to one of 
the following four cases. 

Case 1: Secondary user is silent and a NACK is received from the primary network. The 
NACK during secondary user silence implies that primary network has been in state B and, 
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thus, the primary receiver has failed to receive the packet. Therefore, the belief state in the next 
time slot is: 

p(l) = Peg (45) 
9(1) = ^BVg (46) 

where, as in the two-state case, p(k) and q(k) are the update expressions for p and q, respectively, 
for the kth case. 

Case 2: Secondary user is silent and an ACK is received from the primary network. Primary 
network could be in state G with probability ^ or state Vg with probability The belief 
state in the next time slot is: 

p(2) = — Pgg + —Pv s g (47) 
p + q p + q 

?(2) = ^^cvg + ^-^vgvg (48) 
p + q p + q 

Case 3: Secondary user is transmitting and an ACK is received from the primary network. 
The ACK during secondary user activity implies that primary network has been in state Vg. 
Therefore, the belief state in the next time slot is: 

p(3) = PvgG (49) 

q(3) = Pvgvg (50) 

Case 4: Secondary user is transmitting and a NACK is received from the primary network. 

'— or state B with probability 



Primary network could be in state G with probability or state B with probability 1 ^_ q . The 



belief state in the next time slot is: 

p(4) = -^Pgg + 1 7 P ~ g i^ G (51) 
1 — q 1 — 9 

?(4) = T^^GVg + 1 ~ P ~ (1 Pbv s (52) 
1 — q 1 — q 

Let Qi(p, q), i = 1, 2, 3, 4, denotes the probability that case i above happens: 

Q 1 (p,q) = l-p-q (53) 
Q2(p,q)=P + q (54) 

Q3(p,q) = q (55) 
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Q 4 (p,?) = l-9 (56) 

The parameters p and g characterizing the belief state are updated by one of the previous four 
conditions. If secondary user is listening, the expected current gain can be calculated as: 

Gi(p,q) = wr p (p + q) (57) 
But if the secondary user is transmitting, the expected current gain is: 

G 2 (p,q) = (l~w)r s + wr p q (58) 

The expected current reward is: 

Gi(p, q) if the secondary user is silent in time slot k 
Gi^Pi q) if the secondary user is active in time slot k 
The optimal strategy is the strategy that maximizes the following discounted reward function 

{K+k-l 
^2 a n ~ k * R{n,p n ,q n ,a n ) | p k = p, q = q \ (59) 
n=k ) 

V K (p,q) satisfies the following Bellman equation [fTVll : 

V K (p, q) = max wr p {p + ?) + a ^ Q t (p, q)V K ~ x {p(i), q(i)), 



R(k,p,q,a) 



i=l,2 



[l - w)r L 

j=3,4 

where 



+ wr p q + a^2Qi(p,q)V K l (p(i), q(i)) 



(60) 



V 1 {p, q) = max {wr p (p + q), (1 — w)r s + wr p q} (61) 

When K = oo, V(p, q) denote the maximum achievable discounted reward function. V(p, q) 
satisfies the following Bellman equation [fTTll : 

V(p, q) = m&x{wr p (p + q) + a ^ Qi(Pi 9)V(p(i), q(i)), 



i=l,2 

(62) 



(1 - w)r s + wr p q + a ^ Qi& ?(*)) 



i=3,4 

We solve Equation (l60l) iteratively via approximating the value function at a finite number 
of belief values on a two dimensional grid representing p and q values. The value function 
is initialized and then (l60l) is used to update it. For p and q values not belonging to the grid, 
interpolation or extrapolation is used. After convergence, the secondary terminal decides whether 
to transmit or listen based on the term that maximizes V(p, q) at each value of p and q. 
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III. Multiple Primary Channels 

In this section, we extend the problem of a single primary channel to multiple independent 
erasure primary channels with different transition probabilities. The secondary user has to decide 
which primary channel to access. Here, we focus on the two channel case. However, this 
technique can be extended to the multi-channel case, though with increasing computational 
complexity. 

We have two independent primary channels with two different transition matrices P 1 and P 2 . 



Pi = 


Peei 


-Peni 




-Pnei 


Pnni 


P2 = 


PeE2 


-PEN2 




-PNE2 


-PNN2 



The system model is exactly similar to the two-state model except that when the cognitive 
radio is silent, the secondary transmitter can eavesdrop on the ARQ from both primary channels 
simultaneously. When the secondary user decides to transmit on a specific primary channel, it 
can make use of the ARQ from the other primary channel but the ARQ from the occupied 
channel carries no information since simultaneous transmission is assumed to surely result in a 
NACK. 

Our objective is to choose the transmission strategy that maximizes the expected value of 
weighted sum throughput R given by 

1 K 

R= lim —^(w(R pl (n,p n ,q n ,a n )+R p2 (n,p n ,q ri ,a n )) + (l-w)R s ( ) (63) 

K— >oo A ^— ' V / 

71=1 

where R p i(n) and R P 2(n) are the primary throughput of the first and second channels in time 
slot n, respectively, R s (n) is the secondary throughput in time slot n and < w < 1 is a 
weighting factor that determines the relative importance of the two rates, p n , q n are the beliefs 
of the two primary channels, and a n is the action taken by the secondary user in time slot n. 

The cognitive radio action in time slot n, a n , is one of the following three different options: 
1. Secondary user listens to both primary channels. 

2. Secondary user transmits on the first channel and listens to the second channel. 
3. Secondary user transmits on the second channel and listens to the first channel. 
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We can once more apply dynamic programming to this system to find the optimal decisions 
for the secondary user. The belief state is parameterized by two parameters p and q, where p is 
the probability that the first primary network is in erasure state in the next time slot, and q is 
the probability that the second primary network is in erasure state in the next time slot. 
After each time slot, depending on the action taken by secondary user and the corresponding 
feedback, p can be updated according to one of the following 3 cases: 

Case 1: secondary user is not transmitting on the first channel and an ACK is received from 
the first primary network. Therefore, the updated value of p is: 

p(l) = ^nei (64) 

Case 2: secondary user is not transmitting on the first channel and a NACK is received from 
the first primary network. Therefore, the updated value of p is: 

p(2) = Peei (65) 

Case 3: secondary user is transmitting on the first channel. The probability of erasure at the 
next time slot p is updated by the Markovian property as follows. 

p(3)=pP EE1 + (l-p)P NE1 (66) 

Similarly, the value of q is also updated according to one of the previous 3 cases but for the 
second primary network: 

q(l) = Pne2 (67) 

9(2) = P EE2 (68) 

9(3) = qP E E2 + (1 - q)PNE2 (69) 

where, p(k) and q(k) are the update expressions for p and q, respectively, for the kth case. 
The parameters p and q can be updated by one of the previous conditions according to the 
secondary user decision and the primary networks outputs. 

Now, the secondary user has three possible decisions, whether to transmit on the first channel, 
to transmit on the second channel or to remain idle and listen to both channels feedback. 
If secondary user is idle, the expected current gain will be: 

Gi(p, q) = w(r pl (l -p)+ r p2 (l - q)) (70) 
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But if the secondary user transmits on the first channel, the expected current gain is: 

G 2 {p,q) = (1 - w)r s + wr p2 (l - q) (71) 
In case the secondary user transmits on the second channel, the expected current gain is: 

G 3 (p,q) = (1 -w)r s + wr pl (l -p) (72) 

Therefore, the expected current reward is: 

Gi(p, q) if the secondary user is idle in time slot k 

R(k,p, q,a) — \ G 2 (p, q) if the secondary user is transmitting on first channel in time slot k 
Gsip, q) if the secondary user is transmitting on second channel in time slot k 

The optimal strategy is the strategy that maximizes the following discounted reward function 

{K+k-l ~\ 
oT~ k * R{n,p n , q n , a n ) \ p k =p\ (73) 
n=k J 

V K (p,q) satisfies the following Bellman equation ifTTl : 

V K (p, q) = max j (w(r pl (l - p) + r p2 (l - q)) + a(l -p){l - q)V K -\p{l), q{l)) + 

a(l - p)qV K - l (p(l), q(2)) + ap(l - q)V K -\p(2), q(l))+ 
apqV K -\p{2), g(2)), (1 - w)r s + wr p2 (l - q) + aqV K ^(p(3), q(2)) + 
a(l - q)V K ^(p(3), q(l)), (1 - w)r s + wr pl (l - p) + 
apV K -\p(2), <z(3)) + a{\ - p)^" 1 ^!), q(3)) 



(74) 



where 



(75) 



V 1 (p, q) = max{w(r p i(l - p) + r p2 (l - q)), 

(1 - u;)r s + w;r p2 (l - q), (1 - w)r s + wr p i(l - p)} 
When K = 00, V(p, g) denote the maximum achievable discounted reward function as before. 

We solve Equation (1741) iteratively via approximating the value function at a finite number 
of belief values on a two dimensional grid representing p and q values. The value function 
is initialized and then (1741 ) is used to update it. For p and q values not belonging to the grid, 
interpolation or extrapolation is used. After convergence, the secondary terminal decides whether 
to transmit on the first channel, transmit on the second channel or listen to both channels based 
on the term that maximizes V(p, q) for each value of p and q. 
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IV. Hidden Markov model (HMM) 

In previous sections, we assumed that secondary user has full knowledge of the transition 
probabilities of the primary network system state. Here, we assume that the cognitive user is not 
aware of these transition probabilities. We now divide the time horizon into two intervals, the 
first interval is the training period, where the secondary user "learns" the transition probabilities. 
The second interval is the transmission interval where the secondary terminal uses the estimated 
probabilities in order to derive the optimal transmission policy based on the algorithms presented 
in the paper so far. We propose using HMM to estimate the probabilities via listening to the 
feedback channel. 

The basic idea behind HMM is that the actual system state is unknown. The only available 
information is the observation vector, which is a probabilistic function of the state. For example, 
the three state model can be described as an HMM where the hidden states of the model are B, G 
and Vg as mentioned in Section III-CI The ARQ feedback vector (ACK or NACK) is considered 
the observation vector. This feedback is generated as mentioned above according to Table III. 
Now, we introduce the HMM framework to some of the previously mentioned models. 

A. Two-State System Model 

This model is exactly the same as the two-state system model discussed in Section [TT] except 
that the transition probabilities are unknown. Hence, the secondary transmitter observes the ARQ 
feedback from the primary network and uses this observation vector to estimate the state transition 
probabilities of the primary network using an HMM-based scheme. The HMM framework can 
be applied to the Erasure/Non-Erasure channel model and the Gilbert-Elliot model as well. 

Once the transition probabilities are estimated, the same procedure of dynamic programming 
presented in Section QI] is executed to reach the optimal transmission strategy for secondary user 
to maximize the weighted sum throughput of the primary and secondary networks. 

B. Three-State System Model 

HMM framework is also applied to the three-state model, but here, there is a main difference 
between the three-state model and the two-state model during applying the HMM framework. In 
the two-state model, the secondary user is always silent during transition probabilities estimation 
phase because as long as the secondary user is silent, it obtains the maximum knowledge about 
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the state of the primary channel which will help the secondary user estimate the probabilities 
correctly. On the other hand, in the three state model, the transition probabilities estimation 
can be refined by both the secondary user silence and transmission. During silence, there is no 
uncertainty in the detection of the B state because receiving a NACK feedback, indicates directly 
that the previous state is a B state, but if an ACK feedback is received, there is uncertainty 
whether the previous state is G or Vg. Also during secondary user transmission, there is no 
uncertainty in the detection of state Vg. So a learning scheme would have to combine silence 
and transmission during the training phase. For example, the learning procedure using HMM 
can start with the secondary user being silent, then these estimation results are considered as an 
initial condition for the next phase of learning during which the secondary user is transmitting. 

V. Simulation Results 

For all our simulation results, the belief is randomly initialized from a uniform distribution, 
then after simulations, it converges to the actual belief. For the two state erasure/non-erasure 
channel model, we use the following system parameters: P EE = 0.99, Pne = 0.01, r p = 1 and 
r s — 1. The weighted sum of the primary and secondary throughput is shown in Figure |4] which 
shows a significant gain in the throughput for our proposed scheme over the greedy scheme 
inside the region of interest w > 0.5. Figure @] also shows that our optimal scheme is very close 
to the causal-genie scheme. The causal genie scheme is an upper-bound for the performance 
where we assume that a genie informs the secondary user of the previous primary channel state. 
Figure \5\ shows the optimal values of the number of secondary user consecutive transmitted 
packets M versus different values of the weighting factor w. We can see from Figure [5] that 
in the greedy scheme, the secondary transmitter transmits always (M is infinite) as long as 
w < 0.67 which explains the sudden change in the overall throughput at w = 0.67 in Figure 
HI The optimal strategy has this threshold at w = 0.5 which means that the secondary user 
optimal strategy benefits from learning the channel state rather than transmitting to maximize 
its immediate reward. Our proposed scheme spans the boundary of the primary-secondary rate 
region at number of points where M has an integer value. The piecewise linear connection 
between these points can be achieved by time division multiplexing between different values of 
integer M. For system parameters r p = l,r s = 1 with the same transition probabilities as above, 
the rate region is shown in Figure |6] The rate region is obtained by solving the optimization 
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problem for various values of the weight, w. 

For the three-state model, the system parameters used for generating the simulation results 
are as follows:P B B = Pgg = -Pv g v g = 0.9, P BG = -Pgb = -Pv g G = 0.005, r p = 1 and r s = 1. 
The weighted sum of the primary and secondary throughput is shown in Figure [7] This again 
shows that using our scheme of optimization achieves higher throughput than the greedy scheme 
inside the region of interest w > 0.5. This again shows that our scheme of optimization with 
maximizing the total future throughput comes between the greedy scheme of maximizing just 
the instantaneous reward and the upper-bound of the causal genie. 

For the case of two primary channels presented in Section HHl we show the throughput that 
we obtain using our optimal transmission strategy compared to the simple greedy model. The 
two independent primary channels are identical with transition probabilities Peei = Pee2 = 
0.99, Pnei = Pne2 = 0.01. The secondary user reward is r s = 1 and the primary channels 
rewards are r v \ = r p2 = 1. The weighted sum of the primary and secondary throughput is shown 
in Figure [U This figure also shows that our proposed scheme is better than greedy scheme for 
a region of the weight, w, values. 

For the Gilbert-Elliot mode, Table I shows the probability of receiving an ACK from the 
primary network in different channel states and secondary user decisions. The numerical pa- 
rameters used for the simulation are 71 = 0.2, 7 2 = 0.01, 73 = 0.95 and 74 = 0.3. The system 
transition probabilities are Pee = 0.8, Pne = 0.1. Figure [9] shows a minor increase of the overall 
throughput of our proposed scheme over the greedy scheme. 

To evaluate our proposed algorithm of learning the transition probabilities, we simulated the 
two-state model as follows. For the Erasure/Non-Erasure channel model, we use a sequence 
of observations representing the ARQ feedback for estimating the primary channel transition 
probabilities using HMM-based scheme. The actual transition probabilities which need estimation 
are Pee = 0.99, Pne = 0.01. The estimation of the transition probabilities versus the observation 
vector length is shown in Figure QJJ We can see that estimation is refined by increasing the 
length of the observation vector. Figure [TT] shows the degradation in the overall throughput 
due to using the estimated probabilities using 30 and 100 feedback packets as an observation 
vector in the HMM-based scheme instead of the actual probabilities in the optimal strategy 
calculations. Figure [TT] shows how the length of the observation vector can refine the transition 
probabilities estimation and consequently decrease the throughput degradation. Figure [[2] shows 
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the throughput degradation as a function of the observation vector length at a fixed weight w=0.6. 
These results show that we can accurately estimate the transition probabilities and suffer limited 
loss in throughput if we use around 100 packets in our learning algorithm. 
For the Gilbert-Elliot model, the actual transition probabilities we estimate are Pee = 0.8, Pne = 
0.1. The estimation of the transition probabilities versus the observation vector length is shown 
in Figure [T3] Clearly and expectedly, increasing the length of the observation vector leads to 
better estimation. 

VI. Conclusion 

In this work, the ACK/NACK feedback from the primary receiver is exploited by the secondary 
transmitter in order to find optimal transmission strategies that maximize the weighted sum 
of primary and secondary throughput. We have formulated the problem when the number of 
primary channel states is ageneral number of states S. We then tackled the two-state system 
model, where we have shown that the optimal transmission strategy is to transmit for a fixed 
number of packets, M, after hearing a NACK. We derived an expression for M and derived a 
closed-form expression of the optimal overall throughput. For the Gilbert-Elliot model, we have 
shown that the optimal strategy is a threshold based policy on the belief of the secondary user 
that the primary link is in the erasure state, and used dynamic programming to obtain the optimal 
transmission strategy. We have then solved the problem for the case of three channel states and 
used dynamic programming to obtain the optimal secondary user policy. For the multiple primary 
channels model, we derived the optimal strategy for the secondary user to choose the primary 
channel on which to transmit in order to maximize the current and future reward. In the case 
of unknown Markov chain transition probabilities, we proposed using HMM to estimate these 
probabilities, and have shown via simulations that close to optimal performance can be obtained 
by using long enough training period. 

VII. Appendices 

A. Transmission policy of the two-state system is threshold based 

We show in this appendix that the optimal policy of the two-state system (Erasure/Non-Erasure 
channel model) is threshold based in the belief of the secondary user that the primary channel 
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(79) 



is in erasure. The reward function V K (p) is defined as follows. 

V K {p) = max ju>r p (l - p) + (1 - p)V K ~ l (p(l)) + pV^" 1 (p(2)) , 

1 . (76) 

(l- w )r s + ^- 1 (p(3))| 

where 

V 1 ^) = max {wr p (l — p), (1 — w)r s } (77) 

If u>r p < (1 — w)r s , thus, we must have wr p (l — p) < (1 — w)r s as < p < 1. 
This means that 

V 1 (p) = (l-w)r s (78) 
V\p) = max j«;r p (l - p) + (1 - p)^ 1 (P NE ) + pV 1 (P EE 

(1 - w)r s + V^Pee + (1 -p)P NE )| 

V 2 {p) = max {wr p (l — p) + (1 — iu)r s , (1 — w)r s + (1 — w)r s } (80) 

V 2 (p) = (l-w)r s (81) 

Thus, if wr p < (1 — w)r s , we always have 

V(p) = (1 - w)r a (82) 

Hence, secondary user should always transmit. 
On the other hand, if wr p > (1 — w)r s 

V 1 (p) = max {wr p (l — p), (1 — w)r s } (83) 

And, since the maximum of two convex and non-increasing functions is also convex and non- 
increasing, thus, V 1 (p) is convex and non-increasing function in p. 
We now use induction to show that V K (p) is convex and non-increasing for all K. 
Lets assume that V K ~ l (p) is convex and non-increasing function, we have 

tKi\ . J /i _\ , / -i „\taRT-i / r> \ , -t^kt-i 



V A (p) = max<! wr p (l-p) + (1 - p)V K ~ l {P^+pV*' 1 (Pee) , 

(84) 



(1 - w)r s + V K -\p(P EE - P NE ) + P 



NE, 
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First term is linear in p, i.e., it is convex. Second term is also convex in p. Hence, by induction, 
V K (p) is convex in p. Now, since Pee > Pne (assumption), V K ~ 1 (p) is non-increasing function 
in p. Thus, V K ~ l (P EE ) < V K ~ 1 (P^ E ). Thus, first term is non-increasing in p and second term 
is also non-increasing in p. Thus, we can deduce, V K (p) is non-increasing function in p. Hence, 
V K (p) is convex and non-increasing function in p for all K. 
At steady state, 

V(p) = max jwr p (l - p) + (1 - p)V (P NE ) + pV (Pj 

(l-w)r s + V(pP EE + (l-p)P NE 
Since V(p) is convex 

V (pP EE + (1 - p)P NE ) < pV (P EE ) + (1 - p)V (P NE ) (86) 
Since (1 — w)r s < wr p (otherwise secondary user is always transmitting) 

(1 _ w y s + V(pP EE + (1 - p)P NE ) < wr p + pV (P EE ) + (1 - p)V (P NE ) (87) 



(85) 



Therefore, the first and second terms of Equation (1851) can be written as 
2nd term < 1st term +wr p p 

Then, if p — 0, 2nd term < 1st term, thus, the decision is to listen. 
If p = 1 

V(p) = max{y (Pee) , (1 - w)r s + V(P EE )} (88) 
V(p) = (1 — w)r s + \^(Pee)- Thus, the decision is to transmit. 

Since the first and second terms of Equation (1851) are convex and non-increasing functions in p, 
the optimal strategy is a threshold based strategy in p, i.e., there is a unique threshold value of 
p called pth at which the secondary user converts from listening to transmitting which represent 
the intersection between first term and second term of Equation (l8~5l) plotted versus p. 
Due to convexity of the first and second terms of Equation (1851) . the intersection p th has an upper 
and lower bounds as follows. 



:i " W)r " < P„, < 1 - ^^Pne (89) 



wr p wr p 



If Pee < Pth, the secondary user will listen always whether the channel is in erasure or not, and 
if Pne > Pth, it will transmit always whether the channel is in erasure or not. 
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Thus, for secondary user finite number of transmission, we must have P(E) < p t h < Pee where 
P(E) is the stationary probabilities of being in erasure as we also mentioned in Section fll] 

B. Quasi- concavity of the overall rate 

Here we will show that the expected throughput R(M) (defined in Equation (1281) ) has only a 
single peak for M > 0. We will show this via showing that the equation R(M + 1) — R(M) = 
has only one solution. Using the scheme defined in Section HH the expected overall throughput 
R(M) is defined by 

R{M) = wR p + (1 - w)R s 

wr p Pf (l-w)r s MP£ s 



P« + P,f + MPg s P^ s + P^ + MP<f 
wr p (l - T M (P EE )) + (1 - w)r s MPNE 



T M (P EE ) + P NE + MP 



(90) 



where 



We have 



NE 



n M/ D \ Pne + (-Pee — PneP M+1 '0- ~ -Pee J 



wr r 



R{M) = ^ 



1 I^eeJ = — p = (91) 

J- + -T NE — -TEE 



1 - P EE - (1 - Pee) (Pee - Pne) (M+1) ) + (1 - w)r s MP NE (l + P NE - Pee) 
1 - Pee - (1 - Pee)(Pee - Pne) (m+1) + (M + 1)P NE (1 + Pne - Pee) 

(92) 



Let 

A =1-P EE 

P> = Pee — Pne 

P = Pne 



Thus, we have 

^r p (A - AP^+D) + (1 - jrOgi - P) 
1 ' A-ABW+V + iM + 1)P(1-P) 

Similarly, we have 



W r p A(l-P^+ 2 )) + (l- W )r s (M + l)P(l-P) 
1 + j A(l - P(^+2)) + (Af + 2)P(1 - P) 1 ^ 
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Note that for infinite M, we have R(M) ~ R(M + 1). For non-infinite M, and if there is a peak 
at a certain value of M, we will have a root of R(M+ 1) — R(M) = 0. Hence, we want to solve 
the difference equation R(M + 1) — R(M) = for M. By substituting in the rate difference 
equation and applying some algebraic manipulations, we have 

wr p AP(M + 1)(1 - B)(l - £< M + 2 )) - wr p AP(M + 2)(1 - B)(l - P>( M+1 )) + 

(1 - w)r s (M + l)AP{\ -B)(l- B^ 1+1) ) - (1 - w)r s MAP(l -B)(l- P (M+2) ) + (95) 

(1 -iv)r s P 2 (l -Bf = 

Let 

a = wr p AP(l-B) 

b = (1 - w)r s AP{l - B) 

Thus, we have 

((a - 6)M + a)(l - p( A/ + 2 )) - ((a - b)M + a)(l - P (M+1) ) + 

(96) 

(6 - a)(l - p/ A/+1 )) = -(1 - w)r s P 2 (l - P) 2 
((a - 6)M + a)(l - B)£( M+1) + (a - 6)p( M+1 ) = (a - 6) - (1 - w)r s P 2 (l - B) 2 (97) 

Let 

C ={a-b){l-B) 
D = (a - b) + a(l - £) 

Now, we have 

B (M+i)( CM + £)) = ( a _ fr) _ (1 _ u;)r s P 2 (l - P) 2 (98) 
(M + 1) log 5 + log(CM + D) = log((o - b) - (1 - w)r s P 2 (l - P) 2 ) (99) 
(CM+D)\ogB+C\og(CM+D) = C\og((a-b)-(l-w)r s P 2 (l-B) 2 ) + (D-C)\ogB (100) 
Let 

Mi = CM + £> 
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So, we have 

M 1 log B + C log Mi = C log((a - 6) - (1 - w)r s P 2 (l - 5) 2 ) + (£> - C) log £ (101) 

C log Mi = if + Mi log £ (102) 

where K = Clog((a — b) — (1 — w)r s P 2 (l — B) 2 ) + (D — C) logB is a constant independent 

of Mi. The solution of the previous equation is the intersection of linear function of Mi and 

logarithmic function of Mi, i.e., the previous equation has at most two solutions. 

Note that when wr p < (1 — w)r s , the term ((a — b) — (1 — w)r s P 2 (l — B) 2 ) is negative, i.e., the 

secondary user should always transmit which is proved in Appendix IVII-AI 

Otherwise, when (1 — w)r s < wr p , we have 

K - Mi log B 

log Mi = ^r^- (103) 

Now the roots of the equation R(M + 1) — R(M) are the solutions to the above equation, i.e., 
the intersection between a logarithmic function of Mi and a linear function of Mi. Note that 
both these functions are monotonically increasing (B < 1), and since the slope of a logarithm 
is a decreasing function, then if both a logarithmic function and a linear function intersect at 
more than one point, then the slope of the logarithmic function has to be higher than the linear 
function at the first intersection point. Now we show that the maximum slope of the logarithmic 
function is less than the slope of the linear function. 

Since C = (a - b)(l - B) > 0, then the minimum value of Mi is D where Mi = CM + D, 
i.e., the minimum is achieved when M = 0. Thus, the maximum slope of the LHS of Equation 
([1031 with respect to Mi is 1/D and the slope of RHS is 
Now we need to show that < — 



c 
i) 



b)(l-B) 



{a-b) + a(l - B) 



;i-i) + (i-B) 

0(1 -B) 



(l-w)r s 
uir D 



(l-™) + (l-B) 

XY 
X + Y 
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Where X = 1 - and Y = 1 - B. 

wr p 

By substitution, we have — \ogB = log(jzy^ 
Thus, we have 



XY Y 1 

" y < Y < log(^7) (104) 



X + Y 1 + 1 ov l-F 



Which means that 



h< J T 

Equation (11051) shows that the maximum slope of the logarithmic term in the LHS of Equation 
(11031) is always less than the slope of the linear term in the RHS of Equation (11031) (recall 
that both the linear and logarithmic functions are monotonic functions). Thus, the linear and 
logarithmic terms of Equation (11031) intersect only once. 

Thus, the solution of Equation (11021) with respect to M\ has a unique solution which leads to 
the optimal number of secondary user consecutive transmissions M. 

C. Transmission policy of Gilbert-Elliot model is a threshold based policy 

As we mentioned in Section III-Bl imperfect ACK/NACK reception is assumed. Table I shows 
the probability of receiving an ACK from the primary network in different channel states and 
secondary user decisions. The reward function V K (p) is defined as follows. 



V K (p) = max{wr p P(ACK\idle) + P(ACK\idle)V K - 1 (p(A)) + 

(1 - P(ACK\idle))V K - l (p(3)), (1 - w)r s + wr p P(ACK\transmit) + 
P(ACK\transmit)V K - l {p{2)) + (1 - P(ACK\transmit))V K - l {p{l)) 

(106) 
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where 



p(l) = P(E\NACK, transmit)P EE + (1 - P(£| JVAC.fi:, transmit)) P NE 
(1 - 72)P 



[-Pee — -Pne] + Pi 



NE 



(l- T2 )p+(l-7 4 )(l-p) 
p(2) = P(£| ACK, transmit)P E E + (1 - P(P| ACfiT, transmit)) P NE 



72P 

'NE 



[-Pee — -Pne] + Ps 



72P + 74(1 -p) 

p(3) = P^iVACX, idle)P EE + (1 - P(£| JVACAT, idle))P^ E 
(1 -7i)p 



[Pee — -Pne] + P 



NE 



(l- Tl )p+(l-7 3 )(l-p) 
p(4) = P(E\ACK, idle)P EE + (1 - P(E\ACK, idle))P NE 



'NE 



7iP + 7a(l -P) 



[-Pee — -Pne] + Ps 



and 

V^p) = max{wr p (p7i + (1 -p)7 3 ) , (1 -w)r s + wr p (p7 2 + (1 -p)7 4 )} 

= max {wr p p(7i - 73) + wr P 7 3 , (1 - w)r s + wr p p(j 2 - 74) + wr p ^ 4 } 

Let 73 > 71 and 74 > 72 which is an intuitive assumption which indicates that the probability of 
receiving an ACK in the Non-erasure state is greater than that in the Erasure state. Thus, V 1 (p) 
is convex and non-increasing function in p. 

Similar to the Erasure/Non-Erasure channel model, we use induction to show that V K (p) is 
convex and non-increasing in its argument and let's assume that V K ~ 1 (p) is convex and non- 
increasing function, we have 

V K (p) = max ju>r p (p 7l + (1 - p) 73 ) + (p 7l + (1 - p)lz)V K -\p(4)) + 

(1 - p 7l - (1 - p)7 3 )^ A '" 1 (p(3)), (1 - w)r s + W r p (p 72 + (1 - p) 74 )+ (107) 
(p 72 + (1 -p) 74 )^ X " 1 (p(2)) + (1 -p 7 2 - (1 -p) 74 )^- 1 (p(l)) 
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dpi i) [(1 - 12)V + (1 - 74)(1 - P}\ (1 - 72) - (1 - 72)p[(l - 12) - (1 - 74)j 

[(1-72)P+(1-74)(1-P)] 

(l- 72 )(l- 74 ) 



[-Pee — Pne] 



[(l- 72 )p+(l- 74 )(l-p)]' 

[Pee — Pne] 



dp(2) [72P + 74(1 ~ P)] 72 ~ 72P [72 - 74] 

[72P + 74(1 -P)] 2 



7274 



[72P + 74(1 - P 

Similarly, 



[Pee — Pne] 



*(3) (l- 7l )(l-7 3 ) r 1 

-X- — 7 To EE — -r NE 

[(l-7i)p+(l-73)(l-p)] 

* ~T ^L^ee-^neJ 

7iP + 7s(1 -P) 



Since Pee > Pne, we can see that and are greater than or equal to zero. 

This means that p(l), p(2), p(3), and p(A) are non-decreasing in p. 
Lets start with the monotonicity proof of V K (p) in Equation (11071) . 

The terms ior p (p 7l + (1 - p)j 3 ) = wr p (<y 3 - p^ - 71]) and (1 - w)r s + wr v {p^ 2 + (1 - p) 74 ) 
are non-increasing in p as 73 > 71 and 74 > 72- 

Assuming that V K ~ 1 (p) is non-increasing in its argument, we study now the monotonicity of 
f^i(p) where 

^i(p) = (P7i + (1 -P)7 3 )^- I (p(4)) + (1 - Pll - (1 -p) 73 )^ 1 (p(3)) 
= - (73 - 70^-^(4)) + (P7i + (1 — P)73) 

+ (7, - -n)v«-'«3)) + (i - ™ - (i -p)73) ^;'.'f )) ^ 

ap(o) ap 

^i(p) , , n ^ , dV^____jp(4)) rfp(4) dV^-Hg(3)) ^(3) 
-(73-7i)[^- 1 (p(4))-^- 1 (p(3))] 

(108) 
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Note that 



p(3)-p(4) 



7i 



7i 



(1 -7i)p+ (1 -7a)(l -J>) 7iP + 7s(l-p) 
(7a -7i)(l 



p [Pee ~ -Pne] 



[ 7l p + 73(1 - p)} [(1 - li)P + (1 - 73) (1 - P)\ 



[-Pee — -Pne] 



> 



i.e., p(3) > p(4) and since V x (p) is non-increasing in its argument. Thus, V 1 (p(4)) > 



V K -\p(3)). 



In Equation (11081) . since and are > 0, and 

< 0, i.e., fii(p) is non-increasing in p. 
The same can be shown for VL 2 (p) where 

n 2 ( P ) = ( Pl2 + (1 - p ) lA )v K -\p{2)) + (1 - Vl2 - (i-p)i,)v K -\p(i)) 

by just replacing 71, 73, p(3), and p(4) by 72, 74, p(l), and p(2) respectively. 

Since V 1 ^) is a monotonically non-increasing in p, by induction V^(p) is non-increasing in p. 

Now, we will proof the convexity of V K (p). 

The linear terms u>r p (p 7 i + (l — £1)73) and (1 — tf)r s + u;rp(p7 2 + (l— ^9)74) are convex. Assuming 
yir-i^ j s convex m ^ 

^i(p) g , , ^- 1 (p(3)) dp(3) dV^(p(4))dp(4) 
— ^— =2(73 - 71) -j-j^ — 2(73 - 71) — 



dp{A) 



dV K -Hp(3)) 
dp(3) 



and 



dV K - 1 {p{A)) 
dp(4) 



are < 0. Thus, 



dp 2 



dp(3) dp 



dp(A) dp 



+ (P7i + (! -P)7a) 



d 2 F ir - 1 (p(4)) /dp(4)\2 d^-i(p(4)) d 2 p(4) 
dp 2 {A) \ dp ) 



dp(A) dp 2 



+ (1 -P7i - (I-jOts) 



tf ^~ 1 (p(3)) / rfp(3) \2 d^- 1 ^))^^) 
dp 2 (3) \ dp J dp(3) dp 2 



where 



rf 2 p(3) 
dp 2 

dp 2 



2(1 -7i)(l -73) (73 ~7i) 

[(l-7i)p+(l-7 3 )(l-p)] 3 

27l73(73 - 7l) r p p i 

[-TEE — -r NEj 



[-Pee — -Pne] 



[7iP + 7s(1 ~P 



(109) 



dp 2 



. .d 2 V K - x {p{4)) (dp(A)\ 



+ 



^V A ^(p(3)) ^p(3)y 
,1-P7i-(1-P)7a) dp2{3) 



(110) 
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since *v£#fitt and are > 0. 

Therefore, > 0. The same procedure can be applied to prove that d ^ p - > > 0. 

By induction, V K (p) is convex. 

The two terms of V K (p) in Equation ( 11071) are convex and non-increasing. Thus, the optimal 
strategy is a threshold based policy in p and there is a unique optimal solution. 
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Fig. 1. Z-interference erasure channel model. The secondary transmitter, when active, causes interference on the primary 
receiver. The secondary receiver, on the other hand, is shadowed from the primary transmitter, thereby suffering no interference 
from it. ENC is the channel encoder, whereas DEC is the receive decoder. 




Fig. 4. Two-state weighted sum throughput. 
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Fig. 5. Optimal number of consecutive transmitted packets. 




primary throughput 



Fig. 6. primary-secondary rate region. 




Fig. 7. Three-state weighted sum throughput. 
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Fig. 8. Two-channel two-state weighted sum throughput. 




Fig. 9. Weighted sum throughput for the Gilbert-Elliot Model. 




Fig. 10. Transition probabilities estimation. 
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■ Optimal throughput using actual probabilities 

Throughput using estimated probabilities with 30 packets observation vector 
- Throughput using estimated probabilities with 100 packets observation vector 
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Fig. 11. Throughput degradation due to estimation error. 
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Fig. 12. Throughput degradation due to estimation error at weight w = 0.6. 
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Fig. 13. Transition probabilities estimation for the Gilbert-Elliot Model. 



